Book Reviews: Semisupervised Learning for Computational Linguistics by Steven Abney

نویسنده

  • Vincent Ng
چکیده

Semi-supervised learning is by no means an unfamiliar concept to natural language processing researchers. Labeled data has been used to improve unsupervised parameter estimation procedures such as the EM algorithm and its variants since the beginning of the statistical revolution in NLP (e.g., Pereira and Schabes 1992). Unlabeled data has also been used to improve supervised learning procedures, the most notable examples being the successful applications of self-training and co-training to word sense disambiguation (Yarowsky 1995) and named entity classification (Collins and Singer 1999). Despite its increasing importance, semi-supervised learning is not a topic that is typically discussed in introductorymachine learning texts (e.g., Mitchell 1997; Alpaydin 2004) or NLP texts (e.g., Manning and Schütze 1999; Jurafsky and Martin 2000).1 Consequently, to learn about semi-supervised learning research, one has to consult the machine-learning literature. This can be a daunting task for NLP researchers who have little background in machine learning. Steven Abney’s book Semisupervised Learning for Computational Linguistics is targeted precisely at such researchers, aiming to provide them with a “broad and accessible presentation” of topics in semi-supervised learning. According to the preamble, the reader is assumed to have taken only an introductory course in NLP “that include[s] statistical methods—concretely the material contained in Jurafsky and Martin (2000) and Manning and Schütze (1999).” Nonetheless, I agree with the author that any NLP researcher who has a solid background in machine learning is ready to “tackle the primary literature on semisupervised learning, and will probably not find this book particularly useful” (page 11). As the author promises, the book is self-contained and quite accessible to those who have little background in machine learning. In particular, of the 12 chapters in the book, three are devoted to preparatory material, including: a brief introduction to machine learning, basic unconstrained and constrained optimization techniques (e.g., gradient descent and the method of Lagrange multipliers), and relevant linear-algebra concepts (e.g., eigenvalues, eigenvectors, matrix and vector norms,

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semisupervised Learning for Computational Linguistics

Semi-supervised learning is by no means an unfamiliar concept to natural language processing researchers. Labeled data has been used to improve unsupervised parameter estimation procedures such as the EM algorithm and its variants since the beginning of the statistical revolution in NLP (e.g., Pereira and Schabes (1992)). Unlabeled data has also been used to improve supervised learning procedur...

متن کامل

Statistical Methods Statistical Methods#computational Linguistics#machine Learning#stochastic Grammars

" Statistical methods " refers here specifically to statistical methods in computational linguistics. This represents a new body of practice in computational linguistics that has become standard over the last decade.

متن کامل

Analysis of Semi-Supervised Learning with the Yarowsky Algorithm

The Yarowsky algorithm is a rule-based semisupervised learning algorithm that has been successfully applied to some problems in computational linguistics. The algorithm was not mathematically well understood until (Abney 2004) which analyzed some specific variants of the algorithm, and also proposed some new algorithms for bootstrapping. In this paper, we extend Abney’s work and show that some ...

متن کامل

Boosting Applied to Tagging and PP Attachment

Boosting is a machine learning algorithm that is not well known in computational linguistics. We apply it to part-of-speech tagging and prepositional phrase attachment. Performance is very encouraging. We also show how to improve data quality by using boosting to identify annotation errors.

متن کامل

Statistical methods in language processing.

The term statistical methods here refers to a methodology that has been dominant in computational linguistics since about 1990. It is characterized by the use of stochastic models, substantial data sets, machine learning, and rigorous experimental evaluation. The shift to statistical methods in computational linguistics parallels a movement in artificial intelligence more broadly. Statistical m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008